An Extended Version of the KoKo German L1 Learner Corpus
نویسندگان
چکیده
English. This paper describes an extended version of the KoKo corpus (version KoKo4, Dec 2015), a corpus of written German L1 learner texts from three different German-speaking regions in three different countries. The KoKo corpus is richly annotated with learner language features on different linguistic levels such as errors or other linguistic characteristics that are not deficit-oriented, and is enriched with a wide range of metadata. This paper complements a previous publication (Abel et al., 2014a) and reports on new textual metadata and lexical annotations and on the methods adopted for their manual annotation and linguistic analyses. It also briefly introduces some linguistic findings that have been derived from the
منابع مشابه
KoKo: an L1 Learner Corpus for German
We introduce the KoKo corpus, a collection of German L1 learner texts annotated with learner errors, along with the methods and tools used in its construction and evaluation. The corpus contains both texts and corresponding survey information from 1,319 pupils and amounts to around 716,000 tokens. The evaluation of the performed transcriptions and annotations shows an accuracy of orthographic e...
متن کاملVerb Second in Advanced L2 English: A Learner Corpus Study
The present study examines the interface between syntax and discourse-pragmatics in the production of verb second (V2) structures in a corpus of English texts by advanced L1 German and Dutch speakers. The evidence shows that the residual V2 produced by the learner groups studied is the result of a deficit at the interface rather than the transfer of narrow V2 syntax per se. The analysis offered...
متن کاملAnnotating Orthographic Target Hypotheses in a German L1 Learner Corpus
NLP applications for learners often rely on annotated learner corpora. Thereby, it is important that the annotations are both meaningful for the task, and consistent and reliable. We present a new longitudinal L1 learner corpus for German (handwritten texts collected in grade 2–4), which is transcribed and annotated with a target hypothesis that strictly only corrects orthographic errors, and i...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملWhat’s Hard? Quantitative Evidence for Difficult Constructions in German Learner Data
1. Introduction Our study is concerned with the identification of 'difficult' structures in the acquisition of a foreign language, which will shed light on theoretical considerations of L2 processing. We argue that – compared to simple vocabulary items or abstract syntactic patterns – structures that contain lexical material as well as categorial variables are especially difficult to acquire. T...
متن کامل